home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The X-Philes (2nd Revision)
/
The X-Philes Number 1 (1995).iso
/
xphiles
/
hp48hor2
/
bnf.doc
< prev
next >
Wrap
Text File
|
1995-03-31
|
15KB
|
340 lines
This is the summary of chapter 11 'RPL READER and Parser Tools' from the HP
Kernel ERS, August 7, 1986; it's a transcription with minor modifications
where neccassary. Thanks to HP Cv. for giving the permission to publish BNF.
-----------------------------------------------------------------------------
11.3 Parser Tools and the BNF parser
The parser tools are modelled after the PRISM system MINI utility which is a
parser generator (pg) program. The definition of a parser, for our purposes,
is a function which takes arguments in the form
ob1 .. obn n toktyptab string offset token
and returns results in the form
ob1' .. obn' n' toktypetab' string offset' token' flag TRUE
or
ob1 .. obn n toktyptab string offset token FALSE
The interpretation of these pieces conforms to the intuitive notion of a
parser. ob1 .. obn are objects previously parsed, and n is a binary int
representing the number of objects. Toktypetab is a token-type table (see
below). String is the string being parsed, token is also a string, but
represents the piece of the string currently being considered. Offset is a
binary int offset into (or beyond) the string and points to the 1st char
not accounted for in the token.
The flags returned have the following interpretation. If FALSE is returned,
then parsing was never started. If TRUE TRUE is returned, then parsing was
started and successfully completed. If FALSE TRUE is returned, then parsing
was started, but not completed (usally an error condition).
The pg 'BNF' (see below) produces a parser from its Backus-Naur-Form
description given as a string. The BNF consists of a sequence of clauses
seperated by '/'s. Each clause is a sequence of parsers. The evaluation of a
BNF of the form A B C ... 1st evaluates A, and if A returns FALSE, then B C
... is ignored and FALSE is returned. If A returns FALSE TRUE, then B C ...
is again ignored and FALSE TRUE is returned. If A returns TRUE TRUE then B C
... is evaluated sequentially until either a FALSE or a FALSE TRUE is
returned, or the sequence completes. In the 1st case FALSE TRUE is returned,
in the 2nd case TRUE TRUE is returned.
A BNF of the form A B / C D is evaluated by evaluating A B as above, and if
the result is FALSE, then evaluate C D as above and return whatever is
returned by this evaluation. On the other hand, if the evaluation of A B
returns TRUE TRUE or FALSE TRUE, then C D is not evaluated, and the result
of A B is returned.
As an example, consider the parser for a list of names:
list:
leftparen terminator
terminator:
rightparen / name terminator / list terminator
From this description, you can see that (), ( ABC ), ( ABC ( CDE ) ), etc.
are all lists.
The pg is invoked by the word 'BNF' and expect a string in level 1 which
contains the BNF description of a parser. The string starts with the token
"BNF" and is ended by the token "ENDBNF". Between these two, and in addition
to the clause seperator "/", the following are allowed:
- an name of a variable in the current directory, assumed to be a parser.
- the token " followed by another token.
- the token CK" followed by another token.
- the token // used in the same way as /
The parser corresponding to " creates a parser using the token that follows.
The parser created doesn't start unless the current token corresponds to that
parser to the creator. If a match is made, then the created parser returns
TRUE TRUE and drops the current token and gets the next one.
The parser corresponding to CK" is nearly identical to the above, exept that
it leaves the current token right where it is.
The // is used as a clause seperator with the same meaning as /, except that
it modifies the manner in which the last object in the terminated clause is
evaluated. It's usefull for tail-recursive calls to a parser which allways
completes or fails, that is never returns FALSE. If such a parser is the last
object in a clause, then it can be evaluated with a COLA. The token //
indicates such a situation to the BNF parser.
11.4 Extensible Parsers
[Some text I didn't understand about !*XTNDPA and LAM !*FAILTOKENS, a
extensible parser sheme, build-in into the RPL development system...]
... I suggest that a bottom-level RAM- or ROM-WORD which designates a parser
should be named !*<name>PA [I've strip the dammed !*].
To further refine the conventions for parser, we need to pick out two
fundamental kinds of parsers. Although many parsers will not fall strictly
in one category or the other, most, if not all, parsers can be decomposed,
BNF-style, into parsers of these two fundamental kinds. The two kinds of
parsers may be called production parsers and reduction parsers. They are
distinguished by their action on tokens and currently parsed objects.
A production parser ignores already parsed objects and only operates on the
string and token to produce a sequence of objects. In stack notation:
ob1 .. obn n toktyptab string offset token -->
ob1 .. obn n ob1' .. obn' n' toktypetab' string offset' token' flag TRUE
or
ob1 .. obn n toktyptab string offset token FALSE
In particular, a production parser returns a single sequence of objects and
the number of objects, if it successfully completes parsing, and leaves the
objects already on the stack unchanged.
In contrast, a reduction parser doesn't alter the current offset or token and
instead takes the sequence or sequences of objects on the stack and rearranges
and/or combines them into new objects or sequences. Most reduction parsers
will take a fixed number of sequences of objects, with restrictions on the
number of objects in each sequence and return a new sequence. In stack
notation:
<seq 1> n1 .. <seq n> nn toktypetab string offset token -->
<seq 1>' n1' .. <seq n>' nn' toktypetab string offset token flag TRUE
A further convention on parser will be that any parser should be, in its net
effect, a production parser. That is to say that even though a parser may be
composed of sub-parsers which may be either production or reduction parser,
the net effect of the parser, when successfully completed, should be that of
a production parser.
11.4.1 An Example
As an example of the use of the BNF pg, the generator is given in terms of
itself (and a few other pieces). In the example, named subparsers which are
production parsers will begin with a capital letter.
[I used this example to create the *WORKING* (I wonder, if HP have ever
check the functionality of their examples ;-) BNF pg in the directory
BNFPG. I also bring the example to work, you can find it in the
subdirectory BNFXMPL (to run it, 1st run the BNFPG-SETUP and install the
resulting library).]
The parser BNFPA produces a parser function from a clause or sequence of
clauses seperated by '/'s or '//'s. Its operation is quite simple, but has
several features which recognize special situations to aid the effeciency of
the resulting function [ROTFLL :-].
In the general case, a clause of the form A B .. C / is transformed into
ID A !*trior :: ID B !*triand .. ID C !*triand TrueTrue ;
and a clause of the form A B .. C // is transformed into
ID A !*trior :: ID B !*triand .. COLA ID C ;
A sequence of clauses is tramsformed into the concatenation of the forms
shown above and, when the ENDBNF is encoutered, a FALSE is appended to the
form and the entire form is collected into a secondary which is then the
parser function.
[you will find an explaination for !*trior and !*triand along with some
other entries below.]
The two special cases arise when there's only one object in a clause.
Normally, a clause of the form A / would produce
ID A !*trior :: TrueTrue ;
This can be replaced by
ID A !*trior TrueTrue
The 2nd case arises when the single-object clause is the final clause in the
BNF description. Something of the form
BNF ... A ENDBNF
would produce
:: ... A !*trior TrueTrue FALSE ;
Since A is (assumed to be) a parser, when it's evaluated it will return the
tristate-flag on top of the stack. The net result is exactly the same as if
A alone were evaluated. For this reason, BNF will produce
:: ... COLA A ;
[As an specific example of the output of the BNF parser, run the BNFPXMPL-
SETUP, store the resulting directory and then decompile e.g. Bnfob using
RPL->.]
11.5 Tokens and Token-Type Tables
[They are telling us what a token is - see the description of the built-in
GetNextToken below]
11.6 BNF Description of the READER
[Description of a program in their RPL development system, similar to ->RPL.
Long and unusefull.]
11.7 Provided Objects
GetNextToken
( hxs $ # --> hxs $ #' $' )
where hxs is a token-type table, $ is a string and # is an offset (in
chars) into the string. Returns the ttt, the string, the offset to the
first char not used in the token, and the token.
The ttt is a a hxs containing 256 elements, one for each char. The table
gives an implicit correspondence between ASCII characters and their TYPE,
which is one of 16 values:
0 - neutral character
1 - normal character
2 - digit
3 - left delimiter
4 - right delimiter
5 - self delimiter
6 - escape character
7 - diphthong start
[Not described in the ERS, but built-in in the HP48 GetNextToken: ]
8 - ?
9 - ?
10 - ?
11 - ?
12 - ?
13 - comment toggle
14 - comment off
15 - ?
GetNextToken scans the string beginning at the offset until it finds the
1st non-neutral character (or the end of the string). If this char is a
left-delimiter, a right-delimiter or a self-delimiter, then this char is
returned as the token and the new offset points just beyond it.
If the 1st non-neutral char is a dipthong-start, then this char and the
following char are returned as the token.
If none of the above have occured, then the type of the token is determined
from the type of the 1st char, with the exception that a char of type 6
forces itself and the char following it to be interpreted as type 1. Then
chars from the string are accumulated into the token until a type change
occurs, or the offset gos beyond the end of the string. The accumulated
token (with chars of a uniform type) is returned and the offset returned
points beyond the last char which was added to the token.
Note that since one of the arguments for a parser is a token, each parser
that successsfully completes parsing must provide the token for the next
parser in line for evaluation. In particular, the ttt for entry into a
given parser is determined by the previous successfully completed parser.
[Comments:
Digits are treated as ordinary chars if they follow an ordinary char.
Chars between two comment toggles behave as neutral chars.]
!*Tok [PTR BC15]
( hxs $ # $' $'' --> hxs $ # $' FALSE )
( hxs $ # $' $'' --> hxs $''' #' $'''' TRUE TRUE )
If the string on the top of the stack matches the string just beneath it,
the top two strings are dropped, GetNextToken is evaluated and TRUE TRUE is
subsequently returnde. Otherwise, the top string is dropped and FALSE is
returned.
!*tokck [PTR BC47]
( hxs $ # $' $'' --> hxs $ # $' FALSE )
( hxs $ # $' $'' --> hxs $ # $' TRUE TRUE )
If the top two strings match, the top string is dropped and TRUE TRUE is
returned, otherwise the top string is dropped and FALSE is returned.
!*trior
( FALSE --> ? )
( FALSE TRUE --> FALSE TRUE )
( TRUE TRUE --> ? )
In the 1st case, FALSE and the 1st object in the top body of the runstream
are dropped, and execution resumes at the (former) second object. In the
2nd case, the entire top body in the runstream is dropped. In the 3rd case,
TRUE TRUE and the top body in the runstream are dropped, and the (former)
1st object in the runstream is evaluated.
!*triand
( FALSE --> FALSE TRUE )
( FALSE TRUE --> FALSE TRUE )
( TRUE TRUE --> ? )
In the 1st case, TRUE is pushed on the stack; in either the 1st or the 2nd
case, the top body in the runstream is dropped. In the 3rd case, the flags
are dropped, and evaluation continues normally.
TrueTrue
( --> TRUE TRUE )
failed
( --> FALSE TRUE )
-----------------------------------------------------------------------------
The BNF parser generator:
In the BNF.DIR directory you'll find the following variables:
BNFXMPL - BNF example: the BNF pg in terms of itself
SETUP - Run run this to create a BNF pg library (id 800) in stack
level 1. The <-RPL-> and <-LIB-> libraries must be
installed for it to work.
All following variables are source strings for ->RPL:
BED - 'BNF EDitor', nice name .. ;-)
BNF - BNF pg user interface, feed this program with a BNF def.
BNFPA - BNF pg; named !*BNFPA in the ERS
Bnfterm - term parser
Bnfcls - clause parser
Bnfob - object parser
Maketoken - " pg
Makeck - CK" pg
CompileID - object pg
fincls - normal clause end
colacls - COLA clause end
inscola - append COLA to meta object
&ob - append two meta objects
&nostart - append 'failed' to meta object
bnfsav - save BNF pg context
bnfrst - restore BNF pg context
TTT - token-type table
In the BNFXMPL directory you'll find the following variables (source strings,
ready for compiling/BNF pg'ing) :
SETUP - Run it to genarate a directory in stack level 1. The
<-RPL->, <-LIB-> and BNF libraries must be installed for
it to work.
bnf - Same as BNF above. Change name to lower case, because of
the conflict with the lib-word BNF in the BNF lib.
BNFPA - BNF pg in BNF notation
Bnfterm - term parser in BNF notation
Bnfcls - clause parser in BNF notation
Bnfob - object parser in BNF notation
Maketoken - " pg
Makeck - CK" pg
CompileID - object pg
startbnf - start BNF pg
finbnf - close BNF pg
startcls - start clause
fincls - normal clause end
colacls - COLA clause end
contcls - next clause
inscola - append COLA to meta object
&ob - append two meta objects
&nostart - append 'ID failed' to meta object
bnfsav - save BNF pg context
bnfrst - restore BNF pg context
failed - FALSE TRUE
TTT - Token-type table